4 research outputs found

    Extracting Disease-Symptom Relationships by Learning Syntactic Patterns from Dependency Graphs

    Get PDF
    International audienceDisease-symptom relationships are of primary importance for biomedical informat-ics, but databases that catalog them are incomplete in comparison with the state of the art available in the scientific literature. We propose in this paper a novel method for automatically extracting disease-symptom relationships from text, called SPARE (standing for Syntactic PAttern for Relationship Extraction). This method is composed of 3 successive steps: first, we learn patterns from the dependency graphs; second, we select best patterns based on their respective quality and specificity (their ability to identify only disease-symptom relationships); finally, the patterns are used on new texts for extracting disease-symptom relationships. We experimented SPARE on a corpus of 121,796 abstracts of PubMed related to 457 rare diseases. The quality of the extraction has been evaluated depending on the pattern quality and specificity. The best F-measure obtained is 55.65% (for speci f icity ≄ 0.5 and quality ≄ 0.5). To provide an insight on the novelty of disease-symptom relationship extracted, we compare our results to the content of phenotype databases (OrphaData and OMIM). Our results show the feasibility of automatically extracting disease-symptom relationships, including true relationships that were not already referenced in phenotype databases and may involve complex symptom descriptions

    Thematic sheets construction of scientific publications using semantic annotation of scientific publications : Application to biomedical papers.

    No full text
    Les fiches de synthĂšse multi-documents sont considĂ©rĂ©es comme une reprĂ©sentation textuelle organisĂ©e et structurĂ©e des segments textuels. La construction de ces fiches repose sur l’annotation sĂ©mantique des publications scientifiques suivant un ensemble de catĂ©gories discursives qu’on appelle des points de vue de fouille (comme les hypothĂšses plausibles, les rĂ©sultats, ou les conclusions,
). L’annotation sĂ©mantique est rĂ©alisĂ©e automatiquement par la mĂ©thode de l’Exploration Contextuelle. Il s’agit d’une mĂ©thode linguistique computationnelle, implĂ©mentĂ©e par un moteur d’annotation sĂ©mantique, qui repose sur un ensemble de marqueurs linguistiques associĂ©s Ă  des points de vue de fouille. Afin de pouvoir expĂ©rimenter la pertinence des rĂ©sultats de notre systĂšme, nous avons procĂ©dĂ© Ă  l’évaluation des annotations automatiques sur des textes en biologie. La notion des spĂ©culations (hypothĂšses plausibles), particuliĂšrement dĂ©crite dans ce travail, a Ă©tĂ© Ă©valuĂ©e sur le corpus BioScope annotĂ© manuellement pour les spĂ©culations et les nĂ©gations. Nous proposons une application informatique qui permet aux utilisateurs d’obtenir des fiches de synthĂšse organisĂ©es suivant des critĂšres sĂ©mantiques paramĂ©trables par l’utilisateur.Multi-documents thematic sheets are considered as an organized and structured textual representationof textual segments. The thematic sheets construction is based on the semantic annotation ofscientific publications according to a set of discursive categories called search view points (such asspeculation, results or conclusions, ?). The semantic annotation is performed automatically by theContextual Exploration process. It is a computational linguistic method based on a set of linguisticmarkers associated with search view points. This method is implemented by a semantic annotationengine. In order to evaluate the relevance of the results of our system, we used biological papers toevaluate the automatic annotation. The concept of speculation (plausible hypothesis), specificallydescribed in this work, was evaluated on the Bioscope corpus which is manually annotated forspeculation and negation. We propose an application that allows users to obtain thematic sheetsorganized according to semantic criteria configurable by the user

    Discursive mining viewpoints in building multi-document synthesized sheets

    No full text
    International audienceMulti-documents sheets are viewed as semantically structured representations of textual documents. The automatic construction of these sheets is based on the automatic annotation of textual documents according to a set of discursive categories called discursive mining viewpoints. The automatic annotation of a text is performed using the Contextual Exploration processing. It is a linguistic and computational method implemented in the EXCOM2 platform that allows the an-notation of segments (which can be a title, a paragraph, a sentence or a clause) according to a given discursive mining viewpoint
    corecore